There are 14 teams in the competition.
There are a total of 370 players.
There are a total of 7 rounds in the competition.
As it can be observed, the highest goal scorers are team Kangaroos and team Fremantle. Therefore, they are more likely to win the 2020 season.
The dataset contains 68 variables and out of which 34 are numeric variables. Since the pairs plot shows the distribution between single variables and between 2 variables, the total pair plots that can be made will be 34 * 34 = 1156. However, the variable jumper id has been duplicated thrice which makes it 31 * 31 = 961. Total would be 528 which comprises of the number of diagonals (433), upper and lower triangles.
The Scagnostics striated and stringy were used to arrive at the L-shaped plots. Since striated checks the straightness of the points and stringy checks the dispersion. This yielded the variables hitputs and bounces.
The data seemed to have a barrier where in the value does not go beyond a certain x,y value.
# Shiny
ui <- fluidPage(
plotlyOutput("parcoords"),
verbatimTextOutput("data"))
server <- function(input, output, session) {
aflw_num <- aflw_scags[,3:15]
output$parcoords <- renderPlotly({
dims <- Map(function(x, y) {
list(values = x,
range = range(0,1),
label = y)
}, aflw_num,
names(aflw_num),
USE.NAMES = FALSE)
plot_ly(type = 'parcoords',
dimensions = dims,
source = "pcoords") %>%
layout(margin = list(r = 30)) %>%
event_register("plotly_restyle")
})
ranges <- reactiveValues()
observeEvent(event_data("plotly_restyle",
source = "pcoords"),
{
d <- event_data("plotly_restyle",
source = "pcoords")
dimension <- as.numeric(stringr::str_extract(names(d[[1]]),"[0-9]+"))
if (!length(dimension)) return()
dimension_name <- names(aflw_numeric)[[dimension + 1]]
info <- d[[1]][[1]]
ranges[[dimension_name]] <- if (length(dim(info)) == 3) {
lapply(seq_len(dim(info)[2]), function(i) info[,i,])
} else {
list(as.numeric(info))
}
})
aflw_selected <- reactive({
keep <- TRUE
for (i in names(ranges)) {
range_ <- ranges[[i]]
keep_var <- FALSE
for (j in seq_along(range_)) {
rng <- range_[[j]]
keep_var <- keep_var | dplyr::between(aflw_scags[[i]],
min(rng), max(rng))
}
keep <- keep & keep_var
}
aflw_scags[keep, ]
})
output$data <- renderPrint({
tibble::as_tibble(aflw_selected())
})
}
shinyApp(ui, server)
Clumpy and Covex have relatively lower values when compared to the rest. There seems to be outliers in convex, skinny and clumpy data. Sparse and Skewed show clumpiness while the others are more spreadout.
Outlying: 0.0 - 0.2 Stringy: 0.6 Straited: 0.2 - 0.8 Skewed: 0.7 Skinny: 0.4 Splines: 0.5
Outlying: > 0.4 Stringy, Striated: > 0.8 Splines: 0
Clumpy and Convex